Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eurostatv2 pr #1094

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

shapateriya
Copy link

No description provided.

@beets
Copy link
Contributor

beets commented Oct 17, 2024

Thanks for the PR and getting this import done quickly!

A few initial comments:

  • Please run the formatter (./run_tests.sh -f)
  • Please add details to the PR description. Specifically, I'd like to see:
    • Source file URL
    • Any commands used to generate these files. The reviewer (or anyone after) should be able to duplicate the output files based on the instructions here.
    • Any additional manual updates done as part of the import.
    • Verifications done on the output (please include artifacts where possible)

In the future, all scripts should be accompanied by tests. We can skip it this time due to the time crunch, but I would like us to come back to revisit this in the next month.

fyi @manishvats2

Copy link
Contributor

@beets beets left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! these changes look good, especially the updates on measurement methods and stat vars. i do want us to think through the update on Count_Person_Employed --> dc/nm9hcklgg5zb3 (/cc @ajaits @hareesh-ms)

please use git lfs for the input and output tsv / csv's. we should have examples of these in the repo

@@ -126,8 +129,15 @@ def clean_data(preprocessed_df, output_path):

# replace colon with NaN.
clean_df = clean_df.replace(':', '')

clean_df['geo'] = 'dcid:nuts/' + clean_df['geo']
# for ind, geo in enumerate(clean_df['geo']):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove the commented out code

@@ -37,24 +40,37 @@
'Count_Person_Employed_NACE/O-Q',
'Count_Person_Employed_NACE/O-U',
'Count_Person_Employed_NACE/R-U',
'Count_Person_Employed',
'dc/nm9hcklgg5zb3',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a comment that this is "Population: Employed"

@@ -72,7 +82,12 @@ def download_data(self):
"""Downloads raw data from Eurostat website and stores it in instance
data frame.
"""
self.raw_df = pd.read_table(self.DATA_LINK)
# self.raw_df = pd.read_table(self.DATA_LINK)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove

self.raw_df = pd.read_table("nama_10r_3gdp.tsv.gz")
self.raw_df = self.raw_df.rename(columns=({'freq,unit,geo\TIME_PERIOD': 'unit,geo\\time'}))
self.raw_df['unit,geo\\time'] = self.raw_df['unit,geo\\time'].str.slice(2)
# return raw_df
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove

@@ -174,13 +191,16 @@ def generate_tmcf(self):
assert col in ['geo', 'time']
continue
col_num += 1
# Amount_EconomicActivity_GrossDomesticProduction_Nominal_AsAFractionOf_Count_Person
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove

@@ -95,5 +110,9 @@ def get_template_mcf():


if __name__ == "__main__":
# _DATA_URL = "https://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?file=data/demo_r_d3dens.tsv.gz"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants